A Multi-Aspect Comparison and Evaluation on Thai Word Segmentation Programs
نویسندگان
چکیده
Word segmentation is an important task in natural language processing, especially for languages without word boundaries, such as Thai language. Many Thai word segmentation programs have been developed. Researchers and developers in Thai documents usually spend a tremendous amount of time in studying and trying different Thai word segmentation programs. This paper presents the performance of six Thai word segmentation programs which include Libthai, Swath, Wordcut, CRF++, Thaisemantics, and Tlexs. Based on experimental results, we compare these programs in terms of usage, response time, time outs, and relevance.
منابع مشابه
Thai Word Segmentation Verification Tool
Since Thai has no explicit word boundary, word segmentation is the first thing to do before developing any Thai NLP applications. In order to create large Thai word-segmented corpora to train a word segmentation model, an efficient verification tool is needed to help linguists work more conveniently to check the accuracy and consistency of the corpora. This paper proposes Thai Word Segmentation...
متن کاملA Lexicalized Tree Adjoining Grammar for Thai
This paper describes an alternative formalism for Thai syntax parsing based on a lexicalized tree adjoining grammar (LTAG). We first briefly present some formal background concerning LTAG, which is necessary for an understanding of LTAG and its application to Thai. Specifically, we address several issues regarding difficulties in parsing Thai sentences and how to resolve these issues using LTAG...
متن کاملWord Segmentation in Indo-China Languages for Digital Libraries
This chapter introduces word segmentation methods for Indo-China languages. It describes six different word segmentation methods developed for the Thai, Vietnamese, and Myanmar languages and compare different approaches in terms of their algorithms and results achieved. The discussion and comparison of these word segmentation methods will provide underlying views about how word segmentation can...
متن کاملDictionary-based Thai CLIR: Experimental Survey of Thai CLIR
This paper describes our work, which participated in the Cross-Language Information Retrieval (CLIR) at the Cross-Language Evaluation Forum. Our objectives for this experiment have three folds. Firstly, the coverage of the Thai-bilingual dictionary was evaluated when translating queries. Secondly, whether the segmentation process has effected the CLIR. Lastly, this research investigates the que...
متن کاملThoughts on Word and Sentence Segmentation in Thai
This paper discusses problems of word and sentence segmentation in Thai. Disagreements on word segmentation are caused mostly from compound words. To set a standard resource and tool of word segmentation, we suggest that only simple words and true compound words should be segmented in the process of word segmentation. Other compounds can be grouped later by the same means as multiword identific...
متن کامل